Goto

Collaborating Authors

 data storage


Multi-Agent Reinforcement Learning for Autonomous Multi-Satellite Earth Observation: A Realistic Case Study

Hady, Mohamad A., Hu, Siyi, Pratama, Mahardhika, Cao, Jimmy, Kowalczyk, Ryszard

arXiv.org Artificial Intelligence

The exponential growth of Low Earth Orbit (LEO) satellites has revolutionised Earth Observation (EO) missions, addressing challenges in climate monitoring, disaster management, and more. However, autonomous coordination in multi-satellite systems remains a fundamental challenge. Traditional optimisation approaches struggle to handle the real-time decision-making demands of dynamic EO missions, necessitating the use of Reinforcement Learning (RL) and Multi-Agent Reinforcement Learning (MARL). In this paper, we investigate RL-based autonomous EO mission planning by modelling single-satellite operations and extending to multi-satellite constellations using MARL frameworks. We address key challenges, including energy and data storage limitations, uncertainties in satellite observations, and the complexities of decentralised coordination under partial observability. By leveraging a near-realistic satellite simulation environment, we evaluate the training stability and performance of state-of-the-art MARL algorithms, including PPO, IPPO, MAPPO, and HAPPO. Our results demonstrate that MARL can effectively balance imaging and resource management while addressing non-stationarity and reward interdependency in multi-satellite coordination. The insights gained from this study provide a foundation for autonomous satellite operations, offering practical guidelines for improving policy learning in decentralised EO missions.


NEURODNAAI: Neural pipeline approaches for the advancing dna-based information storage as a sustainable digital medium using deep learning framework

Thakur, Rakesh, Singh, Lavanya, Yashika, null, Bundawala, Manomay, Kumar, Aruna

arXiv.org Artificial Intelligence

DNA is a promising medium for digital information storage for its exceptional density and durability. While prior studies advanced coding theory, workflow design, and simulation tools, challenges such as synthesis costs, sequencing errors, and biological constraints (GC-content imbalance, homopolymers) limit practical deployment. To address this, our framework draws from quantum parallelism concepts to enhance encoding diversity and resilience, integrating biologically informed constraints with deep learning to enhance error mitigation in DNA storage. Our results show that traditional prompting or rule-based schemes fail to adapt effectively to realistic noise, whereas NeuroDNAAI achieves superior accuracy. Experiments on benchmark datasets demonstrate low bit error rates for both text and images. By unifying theory, workflow, and simulation into one pipeline, NeuroDNAAI enables scalable, biologically valid archival DNA storage. The rapid increase in global data generation has placed unprecedented pressure on traditional storage media, including magnetic tapes, hard disks, and solid-state drives. These technologies are constrained in terms of density, durability, and sustainability, often degrading within decades and necessitating frequent migration. At the same time, forecasts indicate that the volume of digital data will soon surpass the capacity of existing storage infrastructure, creating an urgent demand for alternative paradigms. DNA has emerged as a promising medium for information storage due to its extremely high density, long-term stability, and universal biological accessibility. Despite this extraordinary theoretical potential, practical adoption remains hindered by challenges in synthesis, sequencing, and error correction. Errors such as substitutions, insertions, and deletions complicate reliable retrieval, thereby motivating the development of novel methods capable of tolerating or correcting these distortions. In response to these challenges, the present work proposes a modular end-to-end framework that simulates the DNA storage pipeline and introduces a Transformer-based neural decoder for robust data reconstruction. Within this system, digital information (in this case, MNIST images) is encoded into DNA sequences, passed through a configurable noise model that simulates synthesis and sequencing errors, and subsequently reconstructed using an encoder-decoder architecture.


Responsible Data Stewardship: Generative AI and the Digital Waste Problem

Utz, Vanessa

arXiv.org Artificial Intelligence

As generative AI systems become widely adopted, they enable unprecedented creation levels of synthetic data across text, images, audio, and video modalities. While research has addressed the energy consumption of model training and inference, a critical sustainability challenge remains understudied: digital waste. This term refers to stored data that consumes resources without serving a specific (and/or immediate) purpose. This paper presents this terminology in the AI context and introduces digital waste as an ethical imperative within (generative) AI development, positioning environmental sustainability as core for responsible innovation. Drawing from established digital resource management approaches, we examine how other disciplines manage digital waste and identify transferable approaches for the AI community. We propose specific recommendations encompassing re-search directions, technical interventions, and cultural shifts to mitigate the environmental consequences of in-definite data storage. By expanding AI ethics beyond immediate concerns like bias and privacy to include inter-generational environmental justice, this work contributes to a more comprehensive ethical framework that considers the complete lifecycle impact of generative AI systems.


Dynamic Adaptation in Data Storage: Real-Time Machine Learning for Enhanced Prefetching

Cheng, Chiyu, Zhou, Chang, Zhao, Yang, Cao, Jin

arXiv.org Artificial Intelligence

The exponential growth of data storage demands has necessitated the evolution of hierarchical storage management strategies [1]. This study explores the application of streaming machine learning [3] to revolutionize data prefetching within multi-tiered storage systems. Unlike traditional batch-trained models, streaming machine learning [5] offers adaptability, real-time insights, and computational efficiency, responding dynamically to workload variations. This work designs and validates an innovative framework that integrates streaming classification models for predicting file access patterns, specifically the next file offset. Leveraging comprehensive feature engineering and real-time evaluation over extensive production traces, the proposed methodology achieves substantial improvements in prediction accuracy, memory efficiency, and system adaptability. The results underscore the potential of streaming models in real-time storage management, setting a precedent for advanced caching and tiering strategies.


AI is reshaping business. This is how we stay ahead of China

FOX News

Fox News Flash top headlines are here. Check out what's clicking on Foxnews.com. We are in the midst of an artificial intelligence (AI)-driven industrial revolution. From self-driving cars to medical diagnostics to next-generation defense and homeland security capabilities, AI is reshaping nearly every industry. As the U.S. races to maintain its global leadership in AI, much of the conversation revolves around natural language processing, the reshoring of the semiconductor supply chain and powering data centers.


SemAI: Semantic Artificial Intelligence-enhanced DNA storage for Internet-of-Things

Wu, Wenfeng, Xiang, Luping, Liu, Qiang, Yang, Kun

arXiv.org Artificial Intelligence

In the wake of the swift evolution of technologies such as the Internet of Things (IoT), the global data landscape undergoes an exponential surge, propelling DNA storage into the spotlight as a prospective medium for contemporary cloud storage applications. This paper introduces a Semantic Artificial Intelligence-enhanced DNA storage (SemAI-DNA) paradigm, distinguishing itself from prevalent deep learning-based methodologies through two key modifications: 1) embedding a semantic extraction module at the encoding terminus, facilitating the meticulous encoding and storage of nuanced semantic information; 2) conceiving a forethoughtful multi-reads filtering model at the decoding terminus, leveraging the inherent multi-copy propensity of DNA molecules to bolster system fault tolerance, coupled with a strategically optimized decoder's architectural framework. Numerical results demonstrate the SemAI-DNA's efficacy, attaining 2.61 dB Peak Signal-to-Noise Ratio (PSNR) gain and 0.13 improvement in Structural Similarity Index (SSIM) over conventional deep learning-based approaches.


Learning Structurally Stabilized Representations for Multi-modal Lossless DNA Storage

Cao, Ben, He, Tiantian, Li, Xue, Wang, Bin, Wu, Xiaohu, Zhang, Qiang, Ong, Yew-Soon

arXiv.org Artificial Intelligence

In this paper, we present Reed-Solomon coded single-stranded representation learning (RSRL), a novel end-to-end model for learning representations for multi-modal lossless DNA storage. In contrast to existing learning-based methods, the proposed RSRL is inspired by both error-correction codec and structural biology. Specifically, RSRL first learns the representations for the subsequent storage from the binary data transformed by the Reed-Solomon codec. Then, the representations are masked by an RS-code-informed mask to focus on correcting the burst errors occurring in the learning process. With the decoded representations with error corrections, a novel biologically stabilized loss is formulated to regularize the data representations to possess stable single-stranded structures. By incorporating these novel strategies, the proposed RSRL can learn highly durable, dense, and lossless representations for the subsequent storage tasks into DNA sequences. The proposed RSRL has been compared with a number of strong baselines in real-world tasks of multi-modal data storage. The experimental results obtained demonstrate that RSRL can store diverse types of data with much higher information density and durability but much lower error rates.


Towards more sustainable enterprise data and application management with cross silo Federated Learning and Analytics

Cao, Hongliu

arXiv.org Artificial Intelligence

To comply with new legal requirements and policies committed to privacy protection, more and more companies start to deploy cross-silo Federated Learning at global scale, where several clients/silos collaboratively train a global model under the coordination of a central server. Instead of data sharing and transmission, clients train models using their private local data and exchange model updates. However, there is little understanding of the carbon emission impact of cross silo Federated Learning due to the lack of related works. In this study, we first analyze the sustainability aspect of cross-silo Federated Learning, across the AI product life cycle instead of focusing only on the model training, with the comparison to the centralized method. A more holistic quantitative cost and CO2 emission estimation method for real world cross-silo Federated Learning setting is proposed. Secondly, we propose a novel data and application management system using cross silo Federated Learning and analytics to make IT companies more sustainable and cost effective.


Quantum computing, blockchains: How the U.S. can update systems for AI potential

FOX News

Countries looking to fully utilize artificial intelligence (AI)'s potential and capabilities will need to look for upgrades to data storage and processing, turning to either blockchains or quantum computing for the way forward, experts told Fox News Digital. "You're going to have massive data storage issues and issues for computation when you get into pattern recognition," Christopher Alexander, chief analytics officer of Pioneer Development Group, told Fox News Digital. The race to develop and implement AI systems cannot occur without proper infrastructure, according to TS2 Space, a Polish internet service provider for the U.S. Army in areas like Iraq and Afghanistan. In a blog post on the company website, TS2 Space highlighted the challenges AI infrastructure faces, including "the sheer volume of data" and "the complexity of AI algorithms and models." "Developing and deploying AI applications require a deep understanding of the underlying algorithms and models, as well as the ability to fine-tune them for specific use cases," the company wrote.


Achieving a sustainable future for AI

MIT Technology Review

More compute leads to greater electricity consumption, and consequent carbon emissions. A 2019 study by researchers at the University of Massachusetts Amherst estimated that the electricity consumed during the training of a transformer, a type of deep learning algorithm, can emit more than 626,000 pounds ( 284 metric tons) of carbon dioxide--equal to more than 41 round-trip flights between New York City and Sydney, Australia. We are also facing an explosion of data storage. IDC projects that 180 zettabytes of data--or, 180 billion terabytes--will be created in 2025. The collective energy required for data storage at this scale is enormous and will be challenging to address sustainably.